3D Tracking of Human Locomotion: A Tracking as Recognition Approach
نویسندگان
چکیده
Estimating mode (walking/running/standing) and phases of human locomotion is important for video understanding. We present a new ”tracking as recognition” approach. A hierarchical finite state machine constructed from 3D motion capture data serves as a prior motion model. Motion templates are used as the observation model. Robustness is achieved by making inferences in the prior motion model which resolves the short-term ambiguity of the observations that may cause a regular tracking formulation to fail. Experiments show very promising results on some difficult sequences. 1 Motivation and Introduction Human motion tracking is important for a wide range of applications in motion recognition, human computer interaction and computer animation. In particular, tracking human locomotion is a key for human motion understanding and visual surveillance. High precision tracking can also be used for motion capture or gait analysis. In this paper, we present a tracking as recognition approach to recover the modes and phases of the human locomotion from monocular sequences; this effectively provides the coarse 3D joint angle trajectories of the human. A mode (e.g., walking, running, etc) is defined as a characteristic sequence of motion patterns or states which we call phases. A large amount of effort has been put in human locomotion tracking (e.g., [3], [11]). A common formulation is to find the state at time assuming the state at time is known. This formulation has a number of difficulties which prevent it from being robust in realistically noisy situations. Firstly, the state space is large for full-body motion (e.g., a human kinematics model has over 20 DOFs). Secondly, the track may drift when the observations (measurements) are ambiguous; such ambiguity is common when tracking a high dimension model using two dimensional image ob1This research was supported, in part, by the Advanced Research and Development Agency of the U.S. Government under contract No. MDA908-00-C-0036. servataions; The Condensation [4] technique alleviates but does not completely solve this problem. Thirdly, the prior knowledge about the motion can only be used locally for prediction in tracking (e.g., [11], [8]). Finally, tracking requires knowledge of initial state; this is a difficult problem and a user often specifies it manually. These difficulties call for stronger prior knowledge on the motion being studied and stronger temporal integration. We propose here a tracking as recognition approach where recognition of a locomotion mode assists in tracking. We employ a prior human locomotion model constructed from 3D motion capture data of human walking and running and represent it as a hierarchical finite state machine. The higher level state machine represents the modes of locomotion while the lower level state machine represents phases of each mode (walking and running only) with each phase corresponding to a characteristic pose. Tracking is performed as the following. In each frame, we measure the likelihood of each state using motion template. We buffer the observations for frames and a best path in the motion model is computed by using the Viterbi algorithm. The best path gives the best mode and phase in each frame according to the observation of frames. In this approach, motion is inferred only in the prior motion model, which greatly constrains the solution space. Short-term ambiguities of measurements are resolved by considering all measurements globally. Initial state is estimated in the same way as all the other frames by inference. Besides, high level description (i.e., modes) is obtained at the same time. 3D body is estimated from a monocular sequence by using a 3D motion model. In [10] pioneering work on motion recognition using medical motion data is presented. The joint angle values are represented as continuous curves and a search is done on the entire cycle to match the predicted edges with image edges. A Kalman filter is used to track the position and the phase information. The experiment is done only on human walking motion parallel to the image plane. In [7], a similar idea to treat tracking as an inference problem in an HMM is described; however, this approach is appearance-based and works well only for the viewpoints for which the system was trained. In [12], walking recognition was used to verify human hypotheses, however, the algorithm is computationally inefficient and constant walking velocity is assumed. Due to the use of state-based motion model representation, the precision of the tracking in our method may be coarse for some applications such as motion capture. But this can still serve as the first step of a coarse-to-fine approach, in which the estimation can be refined locally from a good starting point. We believe that our tracking as recognition approach is a general formulation and can also be applied to other kinds of motion which can be well described with 3D limb trajectories and for which a state-based motion model can be built. 2 Tracking of Human Locomotion This work is built on top of [12] where we presented a system to track the 3D positions of multiple humans in complex situations. Human segmentation and tracking is performed on the foreground obtained by statistical background subtraction. Known camera model and assumption of motion on a ground plane make tracking in 3D possible from a single camera. The 3D orientation of the human is inferred assuming he/she is facing in the direction of motion. In this work, we try to recover the mode of motion (e.g., walking/running/standing) and the detailed motion of the limbs using a tracking as recognition approach. We only track the motion of the legs, but the coarse motion of other parts is also obtained as we employ a full-body prior motion model. 2.1 The hierarchical locomotion model Human locomotion has many modes, among which walking, running and standing are the three most often seen in daily life. A human can switch between these modes, therefore, the relationship of the modes is naturally represented as a finite state machine as shown in Fig.1.(a). The speed of the body is an important feature to distinguish among these three modes. The prior probability distribution of the speed given the modes is set according to previous research in [1]. This finite state machine as well as the associated feature constitutes the first level of our hierarchical locomotion model. A more detailed model is needed to track the more detailed motion of the limbs of walking and running. Walking and running are both periodic motions. We define a cycle to be the minimum repetitive unit, which equals to two steps. For each mode, several 3D motion capture sequences are gathered to compute an average cycle which starts from the phase when right leg crosses the left leg and moves forward. 3D motion capture data, consisting of a human kinematics model and a sequence of joint angle values, is a concise (a)
منابع مشابه
Analysis and Synthesis of Facial Expressions by Feature-Points Tracking and Deformable Model
Face expression recognition is useful for designing new interactive devices offering the possibility of new ways for human to interact with computer systems. In this paper we develop a facial expressions analysis and synthesis system. The analysis part of the system is based on the facial features extracted from facial feature points (FFP) in frontal image sequences. Selected facial feature poi...
متن کاملModel-based human gait tracking, 3D reconstruction and recognition in uncalibrated monocular video
Automatic analysis of human motion includes initialisation, tracking, pose recovery and activity recognition. In this paper, a computing framework is developed to automatically analyse human motions through uncalibrated monocular video sequences. A model-based kinematics approach is proposed for human gait tracking. Based on the tracking results, 3D human poses and gait features are recovered a...
متن کاملApplying mean shift and motion detection approaches to hand tracking in sign language
Hand gesture recognition is very important to communicate in sign language. In this paper, an effective object tracking and hand gesture recognition method is proposed. This method is combination of two well-known approaches, the mean shift and the motion detection algorithm. The mean shift algorithm can track objects based on the color, then when hand passes the face occlusion happens. Several...
متن کاملReal-Time Interference Detection in Tracking Loop of GPS Receiver
Global Positioning System (GPS) spoofing could pose a major threat for GPS navigation ‎systems, so the GPS users have to gain a better understanding of the broader implications of ‎GPS.‎ In this paper, a plenary anti-spoofing approach based on correlation is proposed to distinguish spoofing effects. The suggested ‎method can be easily implemented in tracking loop of GPS receiver...
متن کامل3D Facial Landmark Tracking and Facial Expression Recognition
In this paper, we address the challenging computer vision problem of obtaining a reliable facial expression analysis from a naturally interacting person. We propose a system that combines a 3D generic face model, 3D head tracking, and 2D tracker to track facial landmarks and recognize expressions. First, we extract facial landmarks from a neutral frontal face, and then we deform a 3D generic fa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002